Metabolic profiling during COVID-19 infection in humans: Identification of potential biomarkers for occurrence, severity and outcomes using machine learning

Background After its emergence in China, the coronavirus SARS-CoV-2 has swept the world, leading to global health crises with millions of deaths. COVID-19 clinical manifestations differ in severity, ranging from mild symptoms to severe disease. Although perturbation of metabolism has been reported as a part of the host response to COVID-19 infection, scarce data exist that describe stage-specific changes in host metabolites during the infection and how this could stratify patients based on severity. Methods Given this knowledge gap, we performed targeted metabolomics profiling and then used machine learning models and biostatistics to characterize the alteration patterns of 50 metabolites and 17 blood parameters measured in a cohort of 295 human subjects. They were categorized into healthy controls, non-severe, severe and critical groups with their outcomes. Subject’s demographic and clinical data were also used in the analyses to provide more robust predictive models. Results The non-severe and severe COVID-19 patients experienced the strongest changes in metabolite repertoire, whereas less intense changes occur during the critical phase. Panels of 15, 14, 2 and 2 key metabolites were identified as predictors for non-severe, severe, critical and dead patients, respectively. Specifically, arginine and malonyl methylmalonyl succinylcarnitine were significant biomarkers for the onset of COVID-19 infection and tauroursodeoxycholic acid were potential biomarkers for disease progression. Measuring blood parameters enhanced the predictive power of metabolic signatures during critical illness. Conclusions Metabolomic signatures are distinctive for each stage of COVID-19 infection. This has great translation potential as it opens new therapeutic and diagnostic prospective based on key metabolites.


Introduction
Since its first emergence in Wuhan city, China in December 2019, severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) continues to be a public health threat and is still spreading around the world, especially with new variants [1,2].As of May 2023, more than seven hundred million cases were confirmed by the WHO with ~7 million were confirmed deaths globally [3].The infection with such virus has greatly imposed financial and social burden for countries and individuals throughout the world attributed to its disastrous consequences on both patients and their families [4].Egypt has been one of the countries that was impacted by COVID-19 pandemic, with up-to-date number of cases slightly exceeding half million, of those patients ~5% died from the disease [3].
Most individuals who acquire SARS-CoV-2 infection experience mild disease.However, estimates of ~14% of COVID-19 patients could develop severe disease and ~5% experience shock or multiple organ failure, as well as lung dysfunction necessitating ventilation, especially those with complications or those suffering acute respiratory distress syndrome (ARDS) [5].With the virus trying to hijack the host machinery for survival purposes, the infection with SARS-CoV-2 triggers a plethora of host responses [6], of which metabolome represents one important component.The repertoire of human metabolomes represents an ensemble of several thousands of molecules that cover an amble range of concentration (from <1 nM to >1 μM) and are produced by either the host genome or the genome of host microflora [7].The blood is the primary carrier of host metabolites, the relative concentration of which mirrors the patho-physiological status of an individual, and thus could inform virus-induced tissue lesions and organ failure.It has been reported that an organism's metabolome is a more accurate gauge of its metabolic state than its proteome or transcriptome [8].
Changes in endogenous metabolites is one of the characteristics of COVID-19 infection [9][10][11].The intensive hypoxia associated with COVID-19-induced lung impairment possibly leads to altered metabolic profile [12].Reports have indicated reductions in levels of some metabolites in severe COVID-19 infection in patients suffering from diabetes or hypertension [13].Therefore, investigating the alteration in metabolites during COVID-19 infection could unravel important aspects of disease mechanisms such as revealing diagnostic or prognostic metabolic markers and discriminating patient groups based on disease severity [10,14,15].Along with metabolite changes, hyperinflammation and over production of cytokines were observed during COVID-19 infection and has been recognized as a major cause of mortality in COVID-19 patients [16,17].
There have been multiple studies that profiled metabolites in COVID-19 patients aiming to seek novel biomarkers or stratifying the patients.However, most of these studies did not reflect on the stage-to-stage differential regulation of metabolites during disease progression.Indeed, these studies have either compared controls vs. patients at each of the severity stages [18], simultaneously compared between all stages [14], or only contrasted controls with COVID-19 +ve patients without stage definition [19,20].Indeed, the genesis of COVID-19 is a multistep process that progresses over time [15,18].It is generally assumed that different stages of virus replication cycle from entry to virus release are entirely fueled by host's cell energy and metabolic resources [21].This has already been demonstrated for SARS-CoV-2 infection in a robust animal model that mimics humans [22] and is published in the most recent WHO guidelines of patient stratification [23].This already suggests that stage-to-stage reprogramming in host metabolites during COVID-19 infection is valid and worth investigated [18].The accurate classification of patients group, however still challenging, particularly because of the wide and overlapped spectrum of patient's symptoms and, as a result, different pathophysiological pathways that are being affected and interrupted during disease progression.Although few metabolomics studies have used nuclear magnetic resonance (NMR) [24], the preferred method for exploring potentially diagnostic biomarkers in COVID-19 disease has been mass spectrometry (MS)-based metabolomics.Even though liquid chromatography (LC) coupled with MS has been used in many of these studies, gas chromatography and MS have been shown to produce intriguing results about the evolution of illness [25].
In the current study, a targeted metabolic analysis was applied on a cohort of Egyptian subjects who exemplified consecutive stages of COVID-19 infection with the purpose of determining the most promising metabolites that could be used as biomarkers for disease occurrence, progression, and outcome.We also tested, by data analytics and machine learning models, how this could inform patient disease stage and whether the addition of blood indices measured on the same subjects could substantiate the clinical biomarker utility of the identified metabolites.

Study design and study participants
From November 2021 to May 2022, 200 patients with confirmed COVID-19, who were admitted to Menoufia University Hospitals, Menoufia province, Egypt, were recruited.All patients had a clinical suspicion of COVID-19 and were confirmed to be positive using PCR applied on nasopharyngeal and oropharyngeal samples.The study subjects (initial number = 300) were categorized into 4 groups (100 healthy control (HC), 100 non-severe, 50 severe and 50 critical).This classification follows the last updates of WHO "COVID-19 Clinical management: Living guidance, 25 January 2021" [23].The non-severe COVID-19 patients were described as having neither severe nor critical COVID-19 criteria.The severe COVID-19 included patients with any of the following criteria: Oxygen saturation < 90% on room air; in adults, signs of severe respiratory distress (accessory muscle use, inability to complete full sentences, respiratory rate > 30 breaths per minute), in addition to the signs of pneumonia.The critical COVID-19 patients should meet the criteria of acute respiratory distress syndrome (ARDS), sepsis, septic shock or other conditions that would normally require the provision of life-sustaining therapies such as mechanical ventilation (invasive or non-invasive) or vasopressor therapy.Patients with known history of hepatic, renal, cardiac diseases and coagulation disorders were excluded from the study.The HC included apparently healthy healthcare workers with no evidence of COVID-19 infection by standard clinical criteria and laboratory investigation.Patient's demographic and clinical data were also included in the study for analysis (Table 1).
Ethical approval: The study was conducted in accordance with Helsinki Declaration and was approved by National Liver Institute Ethical Committee (IRB: NLI 00003413).Prior to enrollment, each participant was informed about the aims of the study and was offered the chance to sign their informed written consent.

Clinical assessment and samples collection
Patients with clinical suspicion of COVID-19 were assessed upon attending to the COVID-19 isolation unit at the Faculty of Medicine, Menoufia University Hospital throughout the research period.Following assessments by clinical, laboratory, and radiological means, patients were classified as non-severe, severe, or critical.For cases that were considered to be nonsevere, an outpatient treatment prescription was provided, and further follow-up was conducted over the phone or in the COVID-19 outpatient clinic.Those with severe or critical presentations were admitted to the COVID-19 quarantine ward or ICU, where baseline clinical, laboratory, and radiological data were recorded at the time of admission.Additionally, a daily evaluation of the course of the illness, the response to treatment were evaluated and recorded.Blood samples (10 ml) from all subjects were drawn from the cubital vein by venipuncture after fasting at the time of admission/diagnosis.Of these, 5 ml were collected in a plain vacutainer tube and allowed to clot at room temperature then centrifuged (3000 rpm, 5 min.)and the clear supernatant sera were separated and collected in 3-aliquots.The 1 st serum aliquot was used to measure ferritin, procalcitonin, c-reactive protein (CRP), LDH (lactate dehydrogenase), liver enzymes (ALT, AST), and kidney function tests (urea and creatinine).The 2 nd serum aliquot was used to measure IL-6 using human IL6 ELISA kit following manufacturer instructions purchased from Thermofisher Scientific, US (Catalog no.EH2IL6).The 3 rd serum aliquot was kept at -80 for bile acids analysis.From the remaining 5 ml, 2ml were collected into a tube containing Ethylene diaminetetraacetic acid (EDTA) for the complete blood count (CBC), 1 ml was spotted on filter paper (903 Whattman paper, NJ, USA), left to dry on a clean surface for 6 hours, and then stored at −80 C˚until analyzing amino acids, carnitine, and acyl carnitine and the final 2 ml were collected into sodium citrate tubes for D-dimer, prothrombin time, and INR measurements.

Targeted metabolomics using ultra-performance liquid chromatography tandem mass spectrometry (UPLC MS/MS)
2.3.1 Amino acid and carnitines quantification.Amino acid and blood carnitine and L carnitines were measured using MassChrom1 Amino Acids and Acylcarnitines from Dried Blood / Non Derivatised-LC-MS/MS (order No.: 57000/F, Chromsystems Instruments & Chemicals GmbH, Germany).Three mm of the dried blood spot disk was punched into a well of the v-bottomed plate containing 100 μl of lyophilized internal standard reconstituted with 25 ml Extraction Buffer.The plate was sealed with a protective sheet and agitated at 600 rpm for 20 min at room temperature.The supernatant was transferred to a new v-bottomed well plate and covered by aluminum foil sheet.Ten μl of the elute was injected into the MS/MS system at a two-min interval in a flowing stream of 80% acetonitrile at a flow rate of 200 μl/min and reduced to 20 μl/min in 0.25 min.The flow rate increased to (600 μl/min in 1.25 min) then decreased again to (200 μl/min).The scan time of the MS/MS system was 1.2 min.The obtained spectra of all analytes analyzed with multiple reactions monitoring (MRM) mode.The quantitative analysis was achieved using Neolynx software (Neolynx Inc., Glendale, CA, USA) by comparing the signal intensity of an analyte against the corresponding internal standard.The quantification included 14 blood amino acids, 26 blood carnitine and L carnitines (S1 Table ).

Bile acids quantification.
We used standards for the bile acids listed in S1 Table.These were purchased from Sigma-Aldrich Chemicals (Merck KGaA, Darmstadt, Germany).Sample preparation for bile acid quantification was done according to [26] with modification.First, 100 μL blood sample was added to 400 μL ice-cold methanol to precipitate the sample proteins.The mixture was then vortexed, centrifuged (13500 rpm for 15 minutes) and the supernatant was obtained and centrifuged (13500 rpm for 15 minute).Finally, 50 μL of the final supernatant was mixed with 100 μL water/formic acid (1000: 1, v/v).The solution was then injected into LC/MS/MS system.Stock solution of 14 individual bile acids standards were dissolved separately in methanol to form of 10 mmol/L, and then stored at −20˚C [26].The individual stock solutions were then pooled together to obtain mixture of 50 μmol /L in (50:50) deionized water and acetonitrile.Eight-point standard solutions ranging from 0, to 40 μmol/L were prepared by adding appropriate amounts of the mixture 50 μmol /L solution into the bile acid free pooled serum for external standard calibration.5μmol/L of individual stock solutions was injected to LC. Chromatographic separation was carried out on a triplequadruple tandem MS.The analytical column was ACQUITY UPLC BEH C18, 1.7 μm, 2.1x50mm, column (Waters) at 50˚C. 5 μL of samples were eluted with a gradient at a flow rate of 0.28 mL/min.Mobile phase A was water/formic acid (1000: 1, v/v) and mobile phase B was acetonitrile.The elution started with 80% mobile phase A and 20% mobile phase B for an initial 2.1 min after injection, then with a linear gradient of mobile phase B of 20% to 30% over 5.2 min, followed by mobile phase B at 80% over 8 min, which was held for 0.5 min.the column was equilibrated with 80% mobile phase A for 2 min before the injection of the next sample.

Data analysis and machine learning models
The overall study design is shown in Fig 1 .Initially, 54 metabolites (14 serum bile acids, 14 blood amino acid, 26 blood carnitine and L carnitines) that were measured in 300 subjects (100 HC, 100 non-severe, 50 sever and 50 critical) were included in the analyses.Principle component analyses (PCA) was used to visualize outliers in subjects and metabolites.Outliers were subsequently detected using boxplot with whisker.These analyses were done using base R functions in R software version 4.3.1.The raw concentration of metabolites was median normalized, log-transformed and then scaled by mean centralization divided by standard deviation of each variable.The normalized counts were used for further analyses.To compare metabolites concentration among and within study groups, we run 3-machine learning models; Partial least square discriminate analyses (PLS-DA), its orthogonal version (oPLS-DA) and random forest (RF) models.The parameters used to evaluate the performance of the PLS-DA model were accuracy, Q2 (classification ability) and R2 (predictability) [27].These were generated with 3-component (to avoid over-fitting) and 10-fold cross validation.The performance of the oPLS-DA model was assessed using R2Y (variance among subjects as explained by the model) and Q2Y (model predictability) parameters.The significance of the oPLS-DA model was assessed at 0.5 for R2Y and Q2Y [28,29].The RF model was applied using an ensemble of 1000 trees.The mtry function (R software version 4.3.1,random forest package [30]) was used to set the reliable number of variables randomly sampled as candidates at each split (square root of number of variable).For univariate analyses, data normality was calculated using Shapiro wike test [31], and student t-test was run to obtain significant differences between pairs of groups.One-way ANOVA (or its non-parametric mate Kruskal-Wallis test) was used whenever needed after correction for multiple comparison testing using Dunn's test.Adj.P-value <0.05 was considered significant.To obtain the degree and pattern of differential expression in metabolites, data (non-normalized for features) were used to calculate the ratio between each group pairs.A fold change value of |1.5| and adj.P-value of 0.05 was used as cutoffs for significance.These data were visualized using volcano plots generated in R software version 4.3.1,package ggplot2 [32].The finally selected panel of key diagnostic or prognostic metabolites should meet the following criteria: Fold change value of 1.5 (using only those that are significant with adj.P-value < 0.05), oPLS-DA-revealed VIP score > 1 and should be among top 15 metabolites as revealed by the RF model (based on the mean decrease in accuracy).ROC curves for comorbidities were visualized as fraction and its confidence intervals (CI) were estimated using Wilson/Brown method, and significant AUC was set at a cutoff = 0.05.Pathway enrichment analyses were done using Metaboanalyst v.5 [33] and is based on combined functional enrichment analyses and network topology.Hypergeometric test was used for functional enrichment analyses.Significant pathways were selected based on adj.P-value = 0.05.

Exploratory data analyses
To exclude unwanted variation that could bias the analyses, PCA plot based on whole metabolites, metabolites subclasses, blood indices and the subjects were used to identify potential outliers.The analyses revealed some outliers within HC and severe groups.To more accurately identify outlier subject, mean normalized counts of metabolites in each class were visualized using boxplot with whiskers.Based on these analyses, 5 subjects (4 severe covid-19 patients based on amino acid profile and 1 HC based on the bile acid profile) were excluded.Other carnitines were also excluded because they were not detected in all subjects (e.g.tetradecanoyl carnitine, tetradecadienoyl and 3-hydroxyoctadecenoylcarnitine) or because it gave no signals in 94.3% (n = 283) of the subjects and had very low concentration in the remaining subjects (e.g.3-Hydroxyhexadecanoyl).After outliers' removal, a total of 295 subjects (99 HC, 100 nonsever, 46 sever and 50 critical), 50 metabolites (14 amino acids, 14 bile acids and 22 carnitines) and 17 blood parameters entered the formal analysis (S1 Table ).The counts of metabolite and their subclasses were successfully normalized (S1 Fig).

Demographic, clinical characteristics, laboratory values and disease outcomes of study participants
After outlier removal, the clinical cohort presented in this study consisted of 295 subjects, whose demographic and clinical characteristics are shown in Table 1 and S2 and S3 Figs.The severe and critical patients have older participants than other groups, with their ages being significantly higher than those in the HC group (P-value < 0.0001).The whole cohort contained significantly (P-value = 0.0002) more females (n = 170, 57.6%) than males (n = 125, 42.4%) and female patients (n = 121) were significantly (P-value = 0.04) more than male patients (n = 75).Upon admission, the COVID-19 patients were evaluated for severity by COVID-19 Reporting and Data System (CO-RADS), and severity assessment.It was found that 53.5% of the patients had a CO-RADS score of 5, the majority of whom (77.1%) presented with critical (n = 41) or severe (n = 40) disease.All the COVID-19 patients were symptomatic, showing at least one symptom (S2F Fig), with fever being the most represented symptom (80.1%) followed by dyspnea (76.5%) and dry cough (75.5%).The highest recorded symptoms in the nonsevere group were dry cough and fever recorded in 90 and 73% of these patients, respectively, whereas both fever and dyspnea represented the top symptoms in both severe (89.1, 80.4%, respectively) and critical COVID-19 patients (86, 84%, respectively).The number of individuals with diabetes, hypertension, or both together varied significantly among all groups (Pvalue < 0.01) (Table 1).Out of the 196 COVID-19 patients, 40.3% (n = 79) had no comorbidities, 56.1% (n = 110) had diabetes, 29.1% (n = 57) had hypertension, and 25.5% (n = 50) had both diabetes and hypertension.The presence of these comorbidities varied according to COVID-19 severity.Half of non-severe (51%), close to half of severe (43.4%) and only 16% of critical COVID-19 patients had neither diabetes nor hypertension.The representation of diabetes was higher than hypertension in all severity groups with hypertension being nonreported in sever patients (S3A Fig) .We observed an upward increase in the proportion of patients with concurrent diabetes and hypertension as the disease severity increases with those having concurrent diabetes and hypertension representing considerably high proportion of severe (28.2%) and critical (54%) COVID-19 patients.We were also able to follow the patient's outcomes.There was significant association between disease stage and the outcome (P-value = <0.0001).The majority (78.5%) of severe COVID-19 patients survived the infection, whereas the majority (74%) of the critical cases died (S3B Fig) .Seventeen blood parameters were measured at the time of patient admission (S2 Table and S4 Fig) .All blood parameters showed significant differences among the studied groups, in particular comparing HC and non-severe groups.However, the trends in among-group differences were characteristic for some parameters.For instance, inflammation-related indices (e.g.serum ferritin, CRP, LDH, procalcitonin, and D-dimer) showed an increase as the patients exhibited more severe disease, yet they exhibited a significant increase in severe compared to non-severe cases (S2B Table ).IL-6 showed a significant increase in severe and critical cases over the HC but was none significantly altered between severe and critical patients.Furthermore, lymphocytes showed significant (Pvalue = 0.006) reduction in severe compared to non-severe patients, whereas creatinine showed the reverse pattern.INR was the only parameter that showed a significant increase in critical patients over the severe ones.

Diagnostic and prognostic machine learning models
Based on the normalized counts of all metabolites, the initial PCA revealed clear separation between HC and COVID-19 patients with various degrees of severity (i.e.non-severe, severe and critical patients), which rather appeared to overlap.This holds true when blood parameters are added to metabolites (S5 Fig) .Comparable separation patterns were seen when the same analyses were applied to each metabolite subclass and blood parameters (S6 Fig) .Comparing all groups, the PLS-DA model applied to the validation data reflected the results from the PCA, with moderate performance (accuracy: 0.67, R2: 0.7, Q2: 0.7, permutation test: Pvalue < 0.05) (S7 Fig and Table 2).When applied to the validation data, the RF classification model outperformed the PLS-DA model, producing a significantly higher classification accuracy of 0.97 (CI: 0.9-0.9),P-value < 0.0001, and misclassification error of only 0.02.

Models to predict COVID-19 occurrence (distinguishing HC from non-severe patients).
To determine how changes in metabolites could discriminate HC from non-severe patients, who are at the early infection phase (i.e., suffering mild and moderate disease), we trained and validated several machine learning models on the data comparing these two categories.As shown in S8A Fig and Table 2, the PLS-DA gave a clear separation between both groups (accuracy = 1, R2 = 0.95, Q2 = 0.95 at the 1 st component).Applying the oPLS-DA model further supports this (R2Y and Q2Y = 0.95, P-value = 5.152e-16).As shown in Fig 2A and Table 2, adopting the RF model validated the results of the previous models, yet with better and significant differential clustering with accuracy, sensitivity, and specificity equal to 1.To exclude that this is an overfitting problem, we rerun the RF model using different proportions of training and validation data sets, which gave the same results.The predictability of this RF model was high as determined by ROC analyses (Fig 2B).To determine key metabolites that drive the separation between HC and non-severe groups, we combined the prediction results of RF, oPLS-DA models and univariate analyses.The oPLS-DA model predicted 18 metabolites with VIP score > 1 (S3 Table ), the top of which were arginine, malonyl methylmalonyl succinylcarnitine, and tyrosine.The RF model determined key metabolites with high discriminatory ability as determined by their mean decrease in accuracy and GINI indices (values of both indices are listed for all metabolites in S4 Table ).The top 15 metabolites that have the highest mean decrease in accuracy by RF mode are visualized in Fig 2C .The highest 3 metabolites by mean decrease in accuracy were malonyl methylmalonyl muccinylcarnitine (carnitines), glycodeoxycholic acid (bile acid) and arginine (amino acid).The first two of these were also the top metabolites based on mean decrease in GINI index (S4 Table ).The results of univariate analyses are shown as volcano plot in Fig 2D and S5 Table .Out of the 50 metabolites, 37 (74%) were significantly differentially expressed (DE) between HC and non-sever groups (16 up regulated and 21 down regulated).These were almost evenly represented across metabolite categories (11 bile acids, 12 amino acids and 14 carnitines).Intersecting the results of RF, oPLS-DA models with univariate analyses generated a list of 15 metabolites that are important predictors for non-severe disease (Panel 1) (Table 3, S9 Fig) .The metabolic pathways associated with these predictor metabolites are shown in Table 4. Aminoacyl-tRNA biosynthesis and valine, leucine and isoleucine biosynthesis were significantly enriched in this group.Phenylalanine, tyrosine and tryptophan biosynthesis pathway has the highest impact but was non-significantly enriched (adj.P-value >0.05).To obtain key metabolites with high prognostic value for severe disease, oPLS-DA model firstly revealed 20 prognostic metabolites with VIP score > 1, where glycodeoxycholic acid, taurodeoxycholic acid and tyrosine scored the top (Full list are in S3 Table ).Applying RF models produced a list of key metabolites with high predictive value for severe disease (Full list are in S4 Table ).The top 15 metabolites identified by RF model ranked by mean decrease in accuracy are shown in Fig 3C .The highest three of which were bile acids (nam+ely: lithocholic acid, taurolithocholic acid and taurodeoxycholic acid).Lithocholic acid and taurodeoxycholic acid were also the top ones as determined by mean decrease in GINI index.The univariate analyses performed on the same pairwise comparison identified 39 metabolites (78%) as being significant DE between non-severe and severe patients (Full list are in S5  Combining the lists of metabolites that are predicted by RF, oPLS-DA models and univariate analyses obtained a panel of 14 metabolites that are important predictors for severe disease (Panel 2) (Table 3       univariate analyses produced only two metabolites that are important predictors for disease outcome (Panel 4) (Table 3).

Metabolic profiling in critical COVID-19 patients with and without comorbidities
Since metabolic disorders are known to be risk factors for progression of COVID-19 infection, we investigated the enrichment of diabetes and hypertension in our cohort, focusing on the critical patients group.Interestingly, PCA plot on this patient group suggests that the profile of the studied metabolites did not discriminate patients with only critical COVID-19 infection from those having critical COVID-19 infection with comorbidities.RF model run on the same data indicated a slight overlap between those having critical COVID-19 infection only from critical COVID-19 patients with comorbidities (accuracy = 0.93, P-value = 0.005).ROC analyses revealed that having hypertension alone or diabetes plus hypertension are significant predictors for critical COVID-19 infection (P-value < 0.05), with moderate AUCs of 0.7 and 0.6, respectively (S14 Fig) .Given the significant proportion of critical COVID-19 patients with comorbidities (S3A Fig) , it could be that the 2 predicted significant metabolite (panel 3 in Table 3) are not prognostic markers for critical COVID-19 infection only, but are rather markers for this infection superimposed with comorbidities.To investigate this further, the levels of these 2 metabolites were compared across all categories within the critical COVID-19 patients (Fig 6).The level of taurochenodeoxycholic acid did not differ significantly across groups, whereas malonyl methylmalonyl succinylcarnitine showed significant increase in patients with critical COVID-19 plus hypertension over those with critical COVID-19 alone.

Diagnostic and prognostic power of combined blood indices and metabolites models
We aimed to investigate whether combining measurements of blood and metabolites and use these as inputs in one model (combined model) would enhance the classification and predictability of COVID-19 infection compared to cases when only metabolites are used (single model).The results of different comparisons are detailed in Table 2, S7 and S10 Figs.Considering all subject groups, PCA showed no enhancement in the among-groups separation in the  2) was not applicable as it is limited to pairwise comparison.Interestingly, both PLS-DA and oPLS-DA models revealed reduced accuracies, predictability and classification ability of combined model compared to single model when contrasting HC vs non-severe and non-severe vs severe groups, whereas performance of RF remained the same (Table 2).When comparing severe and critical COVID-19 cases, PLS-DA and oPLS-DA demonstrated an enhancement in model performance in the combined model compared to single model (Table 2 and S10B Fig) .In particular, the accuracy, interpretability (R2) and predictability (Q2) of PLS-DA models increased by about 30.1, 14.8, 200%, respectively in the combined model.Along the same line, oPLS-DA showed an increase of 14.8 and 60% in overall variance that is explained by all features between severe and critical groups (R2Y) and the goodness of prediction (Q2Y), respectively.However, RF model showed no enhancement in the combined model over the single one with similar accuracy of 0.92, yet its predictability for combined model exhibited 10.2% increase using the AUC (AUC = 0.97) value over that of the single model (AUC = 0.88).
To more accurately identify the combination of biomarkers that would result in better classification and prediction of critical cases, we build several linear support vector machine models (each with multiple combination of these features) using a combination of the 3-important metabolites (panel 3) and the measured blood indices (n = 17) to build and compared their performance and predictive power using confusion matrices and multivariate ROC analyses.These analyses indicated that increasing number of feature combinations resulted in a gradual, yet slight enhancement in predictability and accuracies of the models.The best model was obtained when all the 20 features were compiled revealing the highest accuracy of 91% (Fig 7A ) and the highest predictability with AUC = 0.9 (Fig 7B).

Correlation of blood parameters with key metabolites in different COVID-19 patients
Here we tried to determine if correlation between blood indices and key metabolites remains the same in different patients' group.The selection of significant correlation between pairs of metabolites and blood parameters was applied using stringent significance criteria (Pvalue < 0.001) and limiting this to the top 10 positive and 10 negative correlations.The analyses showed that none of the significant correlation partners that appeared in the HC was presented in other groups (S6 Table ).Likewise, each of the COVID-19 stages showed unique pair of correlations.In particular, 80, 85 and 75% of correlation pairs appeared uniquely in nonsevere, severe and critical groups, respectively.

Discussion
In this study, we aimed to investigate whether the reprogramming in metabolites that occur during COVID-19 infection could enable determining patients with varying degrees of disease severity.This would allow identifying key metabolites that are important diagnostic or prognostic biomarkers.Determining stage-specific changes in metabolites could enable informed decisions of hospital discharge or help modulating disease treatment if the infection progresses.In addition, understanding the alteration patterns in circulating metabolites during COVID-19 infection would possibly devise novel anti-SARS-CoV-2 therapeutics as shown previously [34,35].

Demographic data of the study participants
As expected, older patients were more enriched in the group with more severe forms of the disease (i.e.those in the severe and critical stages), with significant differences in age between severe and critical cases.Age has been already known as a good predictor of COVID-19 severity [36,37].Despite the reports about sex-induced changes in lipid, amino acid, and other metabolites, there have been reports about discrepancies over the importance of sex and age as determinants of disease occurrence and fatality [38] or non-significant gender-related differences [18].This suggests the importance of seeking other predictors of COVID-19 occurrence and severity.

How different is our design and analyses scheme from other research?
Machine learning models have been successfully applied on metabolomics data for predicting COVID-19 disease occurrence [11], severity and evolution [14,15,18].However, some of these studies employed a single ML model possibly because of the inclusion of large number of patients [10,18].In the current study, we combined the results from two common and robust ML models; the oPLS-DA and RF models, with results of univariate analyses to ensure constructing a more trusted and accurate multivariate prediction and classification scheme.Although the oPLS-DA model applied herein gave a good performance, our data indicate that the application of the non-linear more complex RF model over performed oPLS-DA (Table 2) in all comparisons.Indeed, RF model deems to be more suitable as its complex non-linear algorism fits the non-linear nature of most biological data [39] as stated previously [11].RF model is known to be tolerant for outliers and is robust to over-fitting as shown on simulated and real data [40].Using these models, our results suggest that HC were well discriminated from all other COVID-19 patients based on all metabolites or metabolite subclass indicating that the onset of COVID-19 infection is associated with strong metabolic footprint.With the intension of determining the stage-specific alteration in metabolites, we opted to run pairwise comparisons between consecutive disease stages.

Value of metabolites as biomarkers of early COVID-19 infection phase
In the current study, all applied ML models, and in particular the RF, showed clear discrimination between HC and non-severe COVID-19 patients even when the RF model was run on each metabolite subclass.This suggests that changes in metabolome could signal early phase of COVID-19 infection before sever disease develops.Previous data showed that metabolites change is able to distinguish COVID-19 patients from healthy subjects [15,41].Lo ´pez-Herna ´ndez, Yamile ´et al. showed that PCR+ non-hospitalized patients are well separated from matched controls with high accuracy of 0.88, R2: 0.8, Q2: 0.5 [37].Similarly, Meoni et al used RF model to show a high discrimination between HC and COVID-19 patients using metabolites and lipoprotein parameters with high accuracy (0.87 to 0.91) [11].Metabolites have shown dramatic changes at early COVID-19 infection even in the absence of clinical signs or changes in blood indices [42].We observed that 74% of the metabolites were significant DE in the univariate analyses, with high fold change, especially malonyl methylmalonyl succinylcarnitine (4.9-fold change over HC) suggesting that the underlying reprogramming in metabolites was intense in the non-severe patients relative to HC.Our analyses showed that the differences between HC and non-severe patients originate from changes in a pool of 15 metabolites (panel 1) that are potential markers for disease initiation.Some of the metabolites from this panel were reported previously by others.For instance, methionine has been identified as a robust marker for COVID-19 occurrence in two subsequent COVID-19 waves [43].Leucine and isoleucine were among the identified metabolites in positive COVID-19 patients, yet the difference is that leucine was down regulated in our study [44].Our data highlights the differential regulation (mainly down-regulation) of bile acids upon COVID-19 infection in nonsever patients (except for chenodeoxycholic acid).In alignment, bile acids have been shown previously to be perturbed in COVID-19 infection and they were down regulated [45].However, up regulation in bile acids was also observed in other studies [46,47].The role of bile acids in COVID-19 pathogenesis has been puzzling.Generally, bile acids can limit in-vitro replication of some viruses (e.g.herpes simplex [48] and influenza A virus [49]) and promote in-vitro replication of other viruses such as hepatitis B and C [50].Administration of antibiotics to SARS-CoV-2-infected mice resulted in reduction in certain microbiome that metabolizes primary bile acid to secondary bile acids.This leads to accumulation of primary bile acids, which subsequently were found to inhibit nsp15 endoribonuclease of the virus [51].Furthermore, infection with SARS-CoV-2 itself causes reduction in the diversity of gut microbiome as shown in human patients [52] and primates [53] and microbiota in human gut are known to process primary bile acids into secondary bile acids such as deoxycholic acid and ursodeoxycholic/ chenodeoxycholic acid [54].Bile acids are also biologically active molecules that organize a variety of immune functions, including inflammatory responses.Ursodeoxycholic acid does, in fact, have anti-inflammatory, antioxidant, anti-apoptotic as well as immunomodulatory properties [55].Taken together, it is plausible to assume that the down regulation in bile acid in our study could be virus-induced to facilitate infection.
Except for malonylcarnitine (C3-DC) which showed strong up regulation, carnitines in non-severe group were down regulated.They were slightly up regulated in subsequent stages.Our data remained speculative and did not allow explaining why we observe such a variable DE pattern.Generally, L-carnitines tend to reduce inflammation and oxidative stress [56] and stimulate immunity by improving neutrophil and macrophage function [57].Recently, it has been reported that the increase in the carnitine amount is associated with lower vulnerability to COVID-19 [58].Therefore, we could assume that the host triggers expression of some carnitines (e.g.malonylcarnitine) at early disease phase as a defensive mechanism, while virus tries to down-regulate other carnitine species as the disease progresses.The down regulation of arginine in the non-severe patients in our study complements previous observation in both adults and children infected with COVID-19, who had substantially decreased levels of plasma l-Arginine than controls associated with low l-Arginine-to-ornithine ratio [59].

Value of metabolites in predicting COVID-19 severity
One core intension behind the current work was to determine how changes in metabolite could inform or explain the array of severity degrees seen in patients, and subsequently reveal prognostic biomarkers [16,41].An obvious observation was a gradual decrease in the ability of ML models to classify or predict patients as the severity increases (Table 2).In parallel, the number of significant DE metabolites declined from 39 metabolites (78%), when contrasting non-severe and severe patients, to 3 metabolites (8%) when comparing severe vs critical patients.The magnitude of fold changes of these metabolites was also decreasing for most metabolites.This was also observable when analyzing metabolite subclasses.Regarding the comparison between non-severe and severe groups, our results partially agree with that reported in another study [15], where partial separation existed between symptomatic mild and more severe form of COVID-19.Similarly, PLS-DA model showed certain degree of overlap between mild (non-hospitalized) and both severe (hospitalized) plus critical (intubated) patients [37].Of note, direct comparison among studies could be biased, especially if done on different populations because of the imminent influence of individual's genetic backup on metabolome [60], in addition to other confounders (e.g.environment and life style).The studies also differ largely in the criteria of defining severity scale of patients.Interestingly, we observed considerable overlap between severe and critical patients based on metabolite changes and only small fraction of metabolites were significant DE (n = 3).Here, the predictive ability of the RF model was the lowest (AUC = 0.8) as compared to other comparisons.Although our study did not follow up the same patient, these results suggest that changes in metabolites at the peak of COVID-19 severity might be minimal and non-reflective of patient stage of severity.Similar results were obtained by Gu et al. in China, who found low separation between sever and critical groups [15].Our blood analyses reflected the same notion.Indeed, multiple studies have reported increased levels of inflammatory or coagulation markers such as IL-6, CRP, C-reactive proteins, procalcitonin, ferritin and D-dimers in more severe forms of COVID-19 compared to less severe ones [10,[61][62][63].Our data however reported that these indices increased significantly between HC, non-severe and severe patients, but not when comparing severe vs critical patients (S2 Table ).Therefore, the inflammatory markers that were reported to be indicators for COVID-19 severity, as well as the studied metabolites, exhibited minimal differences between patients in severe and critical groups.The profile of clinical symptoms was also matched between these two groups (S4F Fig) .Taken together, this suggests that the difference in the magnitude of changes in metabolites and inflammatory blood indices between severe and critical cases is minimal, pointing out that other biomarkers might be worth studying at that peak of disease severity, where additional hospitalization care (e.g.intubation) might be needed [37].
The analyses of the pathways suggest unique operating mechanisms in critical phase.For instance, aminoacyl-tRNA biosynthesis was significantly enriched in critical patients.Mining of transcriptomic and proteomic database of aminoacyl tRNA synthetases (aaRSa), essential enzymes in protein translation, revealed an overexpression of many aaRSa in response to infection with three SARS-CoV-2 viruses and that there is a physical interaction between virus M protein and members of these enzymes [64].In addition, arginine biosynthesis pathway was highly enriched in this group.Generally, l-Arginine levels have been shown to affect T cell function [65,66].These results are supported by previous studies, which found a reduced proliferation of lymphocytes in critically ill septic patients, which has been linked to a decrease in l-Arginine availability [65].

Combination of metabolites and blood parameters may enhance the stratification of critical COVID-19 patients
Given the low performance of metabolite model in predicting critical patients and the reported correlation between changes in metabolites and blood parameters [10,11], we sought to test whether addition of blood measurements to the metabolites (combined model) would by any means enhance the identification of critical COVID-19 patients.Similar ideas have been done by our group in the context of COVID-19 diagnosis [67].Indeed, addition of blood parameters to metabolites (combined model) enhanced the predictive and classification power of the metabolites model (single model) for stratifying critical patients.It was also found that the addition of more blood parameters could enhance the performance of model and by extension provide better predictor panel.Similar results were obtained previously by Sindelar et al. [10], who found increased prediction ability of metabolite plus blood model (AUC = 0.7) compared to metabolite model (AUC = 0.6).This suggests the additional clinical value of measuring blood parameters during progression of COVID-19 infection.
We acknowledge that our study has some limitations.The cross-section nature of this study does not allow following up the same patients as they progress through different disease stages, which would have rather given a more precise snapshot of metabolic changes over the disease course.Indeed, doing so is challenging during the time of pandemic.Due to financial limitations, we were not able to study other blood and urine metabolites such as sphingolipids and organic acids.It is worth noting that some of our critically ill patients were either treated at home or were admitted from other clinics making it possible that prior medications given to those patients could have introduced some bias in the measured metabolites.While our study exemplifies how large sample size would allow convenient ML model construction in targeted metabolites-based investigations, a non-targeted screening of these molecules is highly warranted since it offers a complete picture of metabolite reprogramming and enable discovering novel molecules.It is also recommended to include additional patient metadata, especially the underlying metabolic disorders, when it comes to studying metabolic alterations associated with COVID-19 infection.

Conclusion
In conclusion, the underlying changes in metabolites were more characteristic, and thus could be important predictors for patients in non-severe and severe stages, but not for those suffering critical disease.Concurrent measurements of blood parameters and key metabolites could enhance the prediction ability of metabolites in those critically ill patients.Our analyses scheme suggests panels of key metabolites that could be used as diagnostic and prognostic markers for COVID-19 infection.Subsequent wide scale validation studies could further consolidate these results and open the door for using them in clinical settings.

Fig 1 .
Fig 1.Schematic diagram showing the overall design of the study.The general steps were subject selection and stratification, data quantification, acquisition, data analyses, and model evaluation.https://doi.org/10.1371/journal.pone.0302977.g001

Fig 2 .
Fig 2. Machine learning models and univariate analyses discriminating HC from non-severe COVID-19 patients.A. Proximity plot of the RF model discriminating controls from non-sever COVID-19 patients.The ellipse shows confidence intervals B. ROC analyses showing the prediction ability of the model.AUC: Area under the curve.C. Top 15 metabolites that are important predictors for non-severe COVID-19 patients as revealed by RF model.The metabolites are color-grouped by their class and are ranked descending by their mean decrease in accuracy (the higher the mean decrease in accuracy the more important the metabolite).D. Volcano plot showing the results of the univariate analyses.The figure depicts the relationship between log2FC value of each metabolite (x-axis) against its -log10FDR (y-axis).Red and blue dots refer to metabolites that are significantly up and down regulated, respectively.Nonsignificant DE metabolites are shown as grey dots (-log10 adj.p value <1.3).The metabolite class are shape coded.The dashed horizontal line refers to the value of 1.3, the -log10 for a 0.05 FDR.The vertical dashed lines refer to the cutoff that equates a log2fold change value of |1.5|.https://doi.org/10.1371/journal.pone.0302977.g002

Fig 3 .
Fig 3. Machine learning models and univariate analyses discriminating non-severe and severe COVID-19 patients.A. Proximity plot of the RF model discriminating non-sever from severe COVID-19 patients.The ellipse shows confidence intervals B. ROC analyses showing the prediction ability of the model.AUC: Area under the curve.C. Top 15 metabolites that are important predictors for severe COVID-19 patients as revealed by RF model.The metabolites are color-grouped by their class and are ranked descending by their mean decrease in accuracy (the higher the mean decrease in accuracy the more important the metabolite).D. Volcano plot showing the results of the univariate analyses.The figure depicts the relationship between log2FC value of each metabolite (x-axis) against its -log10FDR (y-axis).Red and blue dots refer to metabolites that are significantly up and down regulated, respectively.Non-significant DE metabolites are shown as grey dots (-log10 adj.p value <1.3).The metabolite class is shape coded.The dashed horizontal line refers to the value of 1.3, the -log10 for a 0.05 FDR.The vertical dashed lines refer to the cutoff that equates a log2fold change value of |1.5|.https://doi.org/10.1371/journal.pone.0302977.g003

Fig 4 . 3 . 3 . 3 Fig 5 .
Fig 4. Machine learning models and univariate analyses discriminating severe and critical COVID-19 patients.A. Proximity plot of the RF model discriminating sever from critical COVID-19 patients.The ellipse shows confidence intervals B. ROC analyses showing the prediction ability of the model.AUC: Area under the curve.C. Top 15 metabolites that are important predictors for critical COVID-19 patients as revealed by RF model.The metabolites are color-grouped by their class and are ranked descending by their mean decrease in accuracy (the higher the mean decrease in accuracy the more important the metabolite).D. Volcano plot showing the results of the univariate analyses.The figure depicts the relationship between log2FC value of each metabolite (x-axis) against its -log10FDR (y-axis).Red and blue dots refer to metabolites that are significant up and down regulated, respectively.Non-significant DE metabolites are shown as grey dots (-log10 adj.p value <1.3).The metabolite class are shape-coded.The dashed horizontal line refers to the value of 1.3, the -log10 for a 0.05 FDR.The vertical dashed lines refer to the cutoff that equates a log2fold change value of | 1.5|.https://doi.org/10.1371/journal.pone.0302977.g004

Fig 7 .
Fig 7. Support vector machine models built using important metabolites in panel 3 (those that best discriminate severe from critical covid-19 cases) together with normalized counts of 17 blood indices.A. Various predictive accuracies as determined by support vector machine models using feature combination (from 2-20).B. ROC curves showing the predictive ability of different support vector machine models of feature combination.Var.refers to different combination.AUC: Area under the curve.CI: Confidence intervals.https://doi.org/10.1371/journal.pone.0302977.g007

Table 1 . Summary of demographic and clinical data of healthy individuals and COVID-19 patients.
*The percentages mentioned in the table represent the proportion of subjects having particular gender, score, symptoms out of the total number of subjects in the respective COVID-19 group.**P-values of numeric parameters were calculated by using two-tailed student t test with assumption of unequal variance.P-values of categorical parameters were calculated using a chi-square test (or Fisher's exact test when suitable).https://doi.org/10.1371/journal.pone.0302977.t001

Table 2 . Parameters for evaluation of each machine learning model discriminating pairs of groups using both the metabolites alone or metabolites plus blood indi- ces.
NA: Not available.PLS-DAL partial least square discriminate analyses.oPLS-DA: Orthogonal partial least square discriminate analyses.RF: Random forest.

performance Among all groups Control vs non-severe Non-severe vs severe Severe vs critical Metabolites Metabolites + blood Metabolites Metabolites + blood Metabolites Metabolites + blood Metabolites Metabolites + blood
Table and are shown Fig 3D).These included 27 up regulated and 12 down regulated metabolites.These significant DE metabolites were represented by 11 bile acids, 13 amino acids and 15 carnitines.

Table 4 ,
the pathways enrichment analyses on those 14 metabolites revealed the presence of primary bile acid biosynthesis, tyrosine metabolism and arginine biosynthesis pathways as significantly enriched.3.3.2.2 Models comparing severe and critical COVID-19 patients.To demonstrate the potential of metabolites as prognostic markers for critical COVID-19 patients, severe and critical groups were compared in the models.As shown in S10A Fig and

Severe vs critical (Panel 3)
on the same data obtained better and significant performance (P-value = 7.95e-06), yet with relatively low accuracy (0.92), sensitivity (0.93) and specificity (0.92) (Fig 4A).To test if this was an overfitting, we rerun RF model on different proportion of training and validation data

Table 4 . Pathways enriched by the predictor metabolites in each pairwise comparison.
Table), top of which were tauroursodeoxycholic acid, malonyl methylmalonyl succinylcarnitine and octenoylcarnitine.Furthermore, RF model predicted list of key metabolites with high discriminatory ability according to their mean decrease in accuracy and GINI index (Full list

Table ) .
Top 15 metabolites detected by RF and ranked by mean decrease in accuracy are shown in Fig 4C.

Table and Fig 4D)
. Combining the lists of significant metabolites that are predicted by oPLS-DA, RF models and univariate analyses obtained a panel of 2 metabolites that are important predictors for critical disease (Panel 3) (Table3 and S9Fig).